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Doocriptioa 

Method aa ^And Aa pparatus F-#or Setructuring l^exts 

[0001] The present application hereby claims priority 
under 3 5 U.S.C. §119 on German patent application 
number PS 102 45 876.6 filed September 30. 2002. the 
entire contents of which are hereby incorporated herein 
by reference. 

Field of the Inv ention 

f0002] The invention generally relates to a method and 
apparatus for converting unstructured text information 
into a structured format. 

BaekggQund of the Invention 

I0003LParticularly in medical engineering, many free 
text reports are produced today which are recorded in 
the computer using dictaphones and/or voice recognition 
technologies, for example. The problem when handling 
these reports is that automatic access to small 
information parts, "atomic information", is almost 
impossible because the content contains no or just a 
very coarse structure. Free text reports are therefore 
very unsuitable for structured presentation and 
evaluation of the information. 

[00041 In such free text reports, only integrated 
information is processed. This information cannot be 

used for automatic evaluations-; which — moano that^ 

information it contains is thus lost for this 
purpose. This problem is growing as the need for access 
to the atomic information, for example for the purpose 
of coding, increases. 
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100051 A ho , Alfred V. et al, "Compilers - Principles, 
Techniques and Tools", Addison Wesley, Reading, 
Massachusetts, 1986, pages 4 to 11, the entire contents 
of which are incorporated herein by reference, 
describes the principle of parsing. 

[00061 Wormek A,K. et al . , "SAM: Speech-Aware 
Applications in Medicine to Support Structured Data 
Entry" , the entire contents of which are incorporated 
herein by reference, discloses a method for the 
structured input of data by voice. 

[0007\^ln these documents, unstructured text information 
is converted into a structure on the basis of the 
derivation of one structure from another. These 
resultant structures also cannot be used for automatic 
evaluations . 

SUMMARY OF THE INVgMTIOM 

JMML^^^fe ^An embodiment of the invention is based on -fefee 
an object of providing a method and an apparatus of the 
■ type montionod initially w hich allow simple, automated 
conversion of unstructured text information from free 
text reports into a structured, evaluatable format, 

£00Q9L^Phe -An embodiment of the invention achieves the an 
^^^ect by mcQng of -via a method having the following 
steps : 

a) structuring rules for structuring the unstructured 
text information are input, 

b) unstructured text information is recorded, 

c) the unstructured text information is parsed in order 
to produce small text fragments. 

d) text units of the unstructured text information are 
searched for text fragments defined in the 
structuring rules, 
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e) the text fragments of the unstructured text 
information are structured on the basis of 
conditions stipulated in the structuring rules. 

fOOlOl The structuring rules to be defined parse the free 
text report, i.e. break it down into smaller units, and 
convert it into a structure which allows a program to 
evaluate this information. Such a rule contains 
information relating to the text fragments for which 
the free text report needs to be searched, which 
structure element is represented thereby, and 
additional information about how the structure needs to 
be set up. 

[00111 In line with the invention, unstructured text 
information can be recorded in step b) by a microphone, 
with a voice recognition program being used for 
conversion into unstructured text information. 

I0ai2LAdvantageously, the structuring rules can contain 
information relating to the text fragments for which 
the free text report needs to be searched, about which 
structure element is represented thereby and about how 
the structure needs to be set up. 

I0gl3L^^^An embodi ment of the inv^^nhnor. achieves the an 
object for the apparatus by moano w ay of an input 
apparatus for unstructured text information, an input 
apparatus and a memory apparatus for structuring rules, 
an extraction apparatus for small text units from the 
unstructured text information, a structuring apparatus 
for producing structured text information on the basic 
of the structuring rules, and an evaluation apparatus 
for the text units in the structured text information. 

lOOMLEvaluatable unstructured text information - can be 
input directly if the input apparatus for unstructured 
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text information has an associated apparatus for voice 
recognition . 

[0015] It has been found to be advantageous if DICOM-SR 
or XML is used as structured format for the structured 
text information. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Ml€\ The pres ent invention will become more fullv 

understood from t he detailed description of preferred 
embodiments give n hereinbelow and the accompanying 
drawings, which are given by way of illustration only 
and thus are not l imitative of the present invention. 
and wherein : 



- ghc invention io cjcplaincd in moro detail below 
%»ith rofcrGncc to cjccmplary cmbodimcnta illuotratcd in 
tho drawing ^ — i n which. 

Figure 1 shows an apparatus in accordance with an 
embodim ent of the invention for structuring 
texts, and 

Figure 2 shows a method in accordance with an 
embodiment of the invention for structuring 
texts . 

DETAILED DESCRI PTION OF THE gRBFERRED EMBODIBtEMTS 

lOOlTLFigure 1 shows an apparatus in accordance with an 
embodiment of the invention for structuring texts, paid 

^ The_apparatus being able L ocan be implemented in a 

personal computer (PC), for example. A keyboard 1 , for 
example, jre— may be used for inputting structuring 
rules and possibly free text reports. In addition, the 
apparatus can have a voice input apparatus 2, for 
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example a microphone or a cassette player, which can be 
used to input the free text reports into the PC The 
voice input apparatus 2 has an apparatus 3 for voice 
recognition, for example with a voice recognition 
program, connected to it which can be used to convert 
the spoken free text reports into text information. 

IflflilLThe keyboard 1 is connected to a memory apparatus 
4 for structuring rules and to a memory apparatus 5 for 
text information, to which the apparatus 3 for voice 
recognition is also connected. The memory apparatus 5 
for text information has an extraction apparatus 6 
connected to it which recognizes and identifies small 
text units from the unstructured text information. The 
extraction apparatus 6 and the memory apparatus 4 for 
the structuring rules have a structuring apparatus 7 
for producing structured text information connected to 
them which converts the extracted text units into a 
structured format on the basis of the stipulated and 
stored structuring rules. The structuring apparatus 7 
has an evaluation apparatus 8 connected to it which 
allows a check for small, structured text units for 
further evaluation. 

MlSLln a medical facility, free text reports are 
recorded, for example using a dictaphone, and are later 
transferred to the coit^uter by a secretary using a 
writing program via the keyboard 1. A free text report 
can also be converted into a written text by the 
apparatus 3 for voice recognition, using an appropriate 
voice recognition program, the free text report being 
able to be input directly into a personal computer by 
i^eans of dictation or subsequently using a player for 
dictation cassettes. 

I0020LTO allow later evaluations of the stocks of data 
produced in this manner, the free text reports are 
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converted into a structured format, for example DICOM- 
SR or XML. in addition to their original format. For 
this purpose, rules are defined which stipulate the 
systematica of conversion. 

[00211 The starting point is unstructured text 
information 9, shown in Figure 2, which has been 
produced by moano w ay of dictation or free text input. 
This text information 9 is used as input for an 
apparatus which is intended to convert this 
unstructured text information 9 into a structured form. 

lOOMLFigure 2 gives the following as an example of 

unstructured text information 9: 

Indication: Diaphoresis. Rule out abnormalities of 
regional wall movements. Check hypertonic 
cardiomyopathy. Rule out myocardial infarction. 
Assess the left of the sputum component from the 
left ventricle. Rule out an aneurysm of the left 
ventricle. History: other relevant histories 
include: further cocaine abuse. Previous CV 
procedures : 

Studyinfo. The study was carried out under general 
anesthesia . 

f00231 To convert this unstructured text information 9 
into a structured form, structuring rules 10 are input 
into this apparatus using the keyboard 1 and are stored 
in the memory apparatus 4, these structuring rules 
forming the basis of the conversion. 

mOMLThese structuring rules 10 define those text 
fragments for which the text needs to be searched and 
what result the finding of such a text fragment has in 
the conversion, m the example described below, finding 
the text fragment "Indication", for example, signifies 
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that a new element which describes an indication is 
inserted into the structure. 

f00251 The text below gives examples of such structuring 
rules 10, which are shown in Figure 2, The general 
basis is that structuring rules 10 are defined which 
stipulate, on the basis of the finding of text 
fragments, how unstructured text information 9 is 
transferred to a structured form. 

r00261 If the text contains the word "Indication", then 
the word needs to be handled with open actions under 
element "Indication". The same applies for the word 
"History" as "History" element and for "Studyinfo" as 
"Study info" element . 

r00271 If the text contains the word "Diaphoresis", then 
it needs to be inserted as an action under element 
"Indication". The word "Cocaine abuse" in the text 
needs to be inserted under element "History entry" . The 
term "General anesthesia" needs to be inserted under 
element " Studyinf o " . 

I0028L.These and other structuring rules 10 which have 
been input once, but can be changed at any time, are 
used to put unstructured text information 9 from the 
free text report into a structured form, so that the 
structured text information 11 which has now been 
obtained and which is described below can be searched 
for particular terms. 

<Report> 

<Indications> 

<Indication> Diaphoresis</ Indication >. Rule out 
abnormalities of regional wall movements. Check 
hypertonic cardiomyopathy. Rule out myocardial 
infarction. Assess the left of the sputum component 
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from the left ventricle. Rule out an aneurysm of 

the left ventricle. 

</Indications> 

<History> 

Other relevant histories include: further <History 
entry> Cocaine abuse <History entry>. Previous CV 
procedure (s) : 
</History> 
< Studyinfo > 

The study was carried out under <Studyinfo> general 
anesthesia <Studyinfo>. 
</Studyinf o> 
</Report> 

[0029] In this case, the invention involves unstructured 
text information being converted into a structure on 
the basis of the rule-based interpretation of contents. 

[00301 Thus, by way of example, two documents can contain 
the following text passages: 

a) "The patient was subjected to an extensive 
examination. An intestinal tumor was diagnosed." 

b) "Following a CT-based examination, a tumor in the 
intestinal tract was diagnosed". 

[00311 To structure the diagnosis, the following rules 
can be applied; 

1. If a sentence contains the words "diagnosed", 
"diagnostic result" or "diagnosis ^ then it 
contains information relating to diagnosis. 

1.1. If the same sentence contains the word "tumor" or 
"malignant tumor", a tumor has been discovered. 
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1.1.1. If the same sentence contains the word 
"intestine" or "intestinal tract", then 
intestinal cancer has been diagnosed. 

1.2. If the sentence contains the word "intestinal 
t\in\or" or "intestinal cancer". then intestinal 
cancer has been diagnosed. 

{00321 The same text fragment is analyzed in this manner 
from a wide variety of aspects. The knowledge obtained 
from these analyses is then converted into 
corresponding structures: 

<Diagnosis> 

<Code> DF~0044A </CODE> 

<Meaning> Intestinal cancer </Meaning> 

</Diagnosis> 

I0033Llt is thus possible to access atomic information 
automatically, since the content is given a finely 
structured form by the inventive apparatus. Hence, free 
text reports can also be used for structured 
presentation and automatic evaluation of the 
information. 



[0034] Exemplary embodiments 


being thus described, it 


will be obvious that the 


same mav be varied in man>r 


"i^^^ Such variations are not to be reaard^r? « 


departure from the soirit 


and scope of the present 


invention, and all such 


modifications as would be 


obvious to one skilled in 


the art are intended to be 
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